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SHRINKAGE PRIORS FOR BAYESIAN PREDICTION 

By Fumiyasu Komaki 

University of Tokyo 

We investigate shrinkage priors for constructing Bayesian predic- 
tive distributions. It is shown that there exist shrinkage predictive 
distributions asymptotically dominating Bayesian predictive distri- 
butions based on the Jeffreys prior or other vague priors if the model 
manifold satisfies some differential geometric conditions. Kullback- 
Leibler divergence from the true distribution to a predictive distri- 
bution is adopted as a loss function. Conformal transformations of 
model manifolds corresponding to vague priors are introduced. We 
show several examples where shrinkage predictive distributions dom- 
inate Bayesian predictive distributions based on vague priors. 

1. Introduction. Suppose that we have a set of independent observations 
a;W = (x(l), x(2), . . . ,x(N)) from a distribution with density p(x\6) that 
belongs to a model {p(:c|0)|0 = (8 1 , 9 2 , . . . , 9 d ) £ 0}. An unobserved variable 
y := x(N + 1) from the same distribution p(y\9) is predicted by using a 
predictive density p(y; x( N >). 

We adopt the Kullback-Leibler divergence D{p(y\9),p(y;x^ N ^)} := 
fp(y\9)log{p(y\9)/p(y;x( N ^)}dy, which has a natural information theoretic 
meaning, as a loss function. We evaluate the performance of predictive distri- 
butions by using the risk function E[D(p,p)\9] = fp(x^\9) f p(y\9) \og{p(y\9)/ 
p(y;x^)}dydx( N l 

A widely used method to construct a predictive density is to use a plug-in 
density p(y\9(x( N ')), where 9(x^) is an appropriate estimator of 9. How- 
ever, Bayesian predictive densities 

(N) fp(y\9)p(x( N )\9)n(9)d9 

have better performance than plug-in distributions in many examples [1, 12, 
18]. 
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In the present paper we investigate the use of shrinkage priors for con- 
structing Bayesian predictive distributions asymptotically dominating those 
based on improper vague priors such as the Jeffreys prior. 

There exist many studies on shrinkage estimators where elliptic operators 
including the Laplacian and (super) harmonic functions play important roles 
[5, 7, 10, 14, 23, 24, 25]. 

Recently, several results suggesting that shrinkage priors are useful for 
various prediction problems have been obtained; see [13, 19] for the nor- 
mal model and [21] for the Poisson model. Further studies on more general 
models are required. 

In the present paper construction methods for shrinkage priors are intro- 
duced and properties of them are investigated from the viewpoint of informa- 
tion geometry by using the results of previous studies on asymptotic prop- 
erties of predictive distributions [8, 15, 18, 28]. The model {p{x\9)\6 € 0} is 
regarded as a manifold, and the relation between differential geometric prop- 
erties of the model manifold and the existence of shrinkage priors is studied. 
It is shown that there exist useful shrinkage priors if the model manifold 
satisfies some differential geometric conditions. The geometrical approach 
is useful to investigate Bayesian methods because prior distributions are 
naturally regarded as volume elements on model manifolds. 

In Section 2 we show that there exist shrinkage predictive distributions 
asymptotically dominating the Bayesian predictive distribution based on the 
Jeffreys prior if the model manifold endowed with the Fisher metric satis- 
fies some differential geometric conditions. In Section 3 we introduce confor- 
mal transformations of model manifolds corresponding to prior distributions 
and show that there exist shrinkage predictive distributions asymptotically 
dominating Bayesian predictive distributions based on various priors if the 
transformed model manifolds satisfy some differential geometric conditions. 
In Section 4 we show several examples where shrinkage predictive distribu- 
tions constructed by using the introduced methods asymptotically or exactly 
dominate Bayesian predictive distributions based on vague priors. 

2. Shrinkage priors asymptotically dominating the Jeffreys prior. First, 
we present some differential geometric notions and notation to be used. In 
the following, we assume that the model manifold M is a d{> 2)-dimensional 
connected and orientable C°° manifold. The parameter space G is regarded 
as a coordinate system of M. We use Einstein's summation convention: if 
an index occurs twice in any one term, once as an upper and once as a lower 
index, summation over that index is implied. 

The Fisher metric tensor is defined by gij{0) '■= E[<9j logp(x\6) 
dj log p(x\9)\6], where di := d/d6 l . The coefficients of the a-connection are 

defined by := - f%(% H (0), where I* := \{d m {e)+d jgii {e)- 
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digij(9))g kl (8) are the coefficients of the Riemannian connection, Tijk(9) := 
E[di logp(x\8)dj logp(x\6)dk logp(x\6)\8] is the skewness tensor, and g %3 de- 
notes the (i, j)-component of the inverse matrix of (gij); see [2] for de- 



tails. The — 1-connection and 1-connection are called the m-connection and 



e-connection and their coefficients are denoted by r£» and rf,-, respec- 



ts 

J 



tively. The a-covariant derivative of a vector field v is defined by VjV 

a • , 1 e 

diV J + r 4J.V , and V and V are denoted by V and V , respectively. 

The Laplacian A on a manifold (M, g) endowed with a Riemannian met- 
ric gij is defined by 

A/ = |5r 1/2 ^(b| 1/2 ^'^/) = Vt^djf), 
where / is a real function on M, and \g\ is the determinant of the matrix 

A continuous function G(^,6) of £ and 9 on M x M — {9 = £} is called a 
Green function if it satisfies the following conditions (see, e.g., [4, 27]): 

1. AqG(£,,9) = for all £ G M and # 7^ £, where A# denotes the Laplacian 
with respect to 0. 

2. G(£,0)>O. 

3. In a neighborhood of £, 0) has the singularity 

{(d-2)^!}- 1 dis(£,^)-( d - 2 ), d>3, 
-(l/27r)logdis(£,6»), d = 2, 

where oJd-i '■= 27r rf / 2 /T(d/2) is the area of the (d — 1) -dimensional unit 
sphere and dis(£, 9) denotes the Riemannian distance between £ and 9. 

4. There exists a positive number 5 > such that G(£,0) is a bounded 
function of € {9\0 £ M, dis(£, 0) > 5}. 

When a Green function G(£,0) exists on (M,g), it is represented by 
(1) G(U)= lt(U)dt, 



G(H,9) 





where 7t(£, 0) is the minimal positive fundamental solution of the heat equa- 
tion du(t,9)/dt = Au(t,9); see [16, 17]. 

In the following, we introduce shrinkage priors and evaluate the risk of 
Bayesian predictive distributions based on the shrinkage priors by using 
the results of previous studies on asymptotic properties of predictive distri- 
butions. Asymptotic expansions of predictive distributions are studied by 
Vidoni [28], Komaki [18] and Hartigan [15]. 

Theorem 1. Let (M,g) be a model manifold endowed with the Fisher 
metric. If a Green function G7(£,0) on (M,g) exists, there exist Bayesian 
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predictive distributions asymptotically dominating the Bayesian predictive 
distribution based on the Jeffreys prior ttj(9) oc \g(9)\ 1 ^ 2 . In particular, the 
Bayesian predictive distribution based on the prior ttq(9) d6 := 9)ttj(9) d6, 
where £ 6 M is an arbitrary fixed point, asymptotically dominates the Bayesian 
predictive distribution based on the Jeffreys prior. 

Proof. In the following we assume that d > 2 since a Green function 
does not exist on the manifold when d = 1; see [4]. 

First, we assume that the true parameter value 9 is different from £. The 
Bayesian predictive density based on a prior f{9) can be expanded as 

P f (y\x {N) )=p(y\e m ic(x {N) )) 



(2) 



+ ^^(4nle)(di%K#mle) - r£-0fcp(y|6U)) 



(3) 



+ o P (A r " 1 ), 

where 9 m \ e (i^) is the maximum likelihood estimate, 2j := Tij^g^ k and the 
relation 9, log^j = d t log | 5 y |Va = r|. = f ^ + (1/2)2} is used; see [8, 15, 18]. 
The risk of the Bayesian predictive density Pf(y\x^) is given by 

E[Dlj>(y\e),Pt{y\x m ))\<>\ 

+^v,{ 9 -(^io g x + i T ; 

+ the terms independent of / + o(N~ 2 ); 

see [15, 18]. 

Thus, the difference between the risk of Pn 3 (y\^ N ^) based on the Jeffrey 
prior 7rj(0) and the risk of Pf(y\x^) based on f(9) is given by 

E[ J D(p(y|0),p^(y|x( Ar )))|^]-E[ J D(p(y|0),p / (y|xW))|0] 

(4) -^^ io ^ + H(^ log ^ + ^ 



^V.^^log^ + ^l+o^- 2 ) 
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= ^29 ij di log L d . log L - -^A^- + o(N- 2 ). 

2N Z 7Tj TTj N Z J TTj 

Therefore, if A(//Vj) < and di log(//7Tj) / 0, P/(y|#^) asymptotically 
dominates ^^(t/Ix^^). The prior density ttq(6) satisfies these conditions. 
Next, we assume that 9 = £. 

Since - - 9^)p f (9\x^)d9 = O p (N- 1 ) 

(9 k - 9 k )p f {9\x^)d9 = O p (N~ 3 / 2 ) and 9) - 9'' 
9 f (x^)=J9p f (9\x ( - N ^)d9, we have 

E[D(p(y\e),p f (y\xW))\e] 



j{e i - o^ifii - &) x 

= O p (iV- 1 / 2 ), where 



p(xW\8) I p(y\6) log 



p(y\0) 

'p f (y\xW) 



p(xW\o) / p(y\9)lo. 



1 + 



dydxW 

djp{y\0) ( 
P(y\0) [ 



(5) 



1 did jP (y\e) 

2 p(y\0) 



x (9 j 



9 j ) 



x Pf (9\x (N) )d9 dydxW 



+ 0{N~ 3 ' 2 ) 



1 



9ij(0) lp(x {N) \ew f ( 



)(9}(xW) -0i)dxW + 0{N~ 3 ' 2 
ttg (£ = 6*) and / 



7T G 



This relation holds for all the three cases / 
(£#0); see [22] for details. 

When f = ttj, the asymptotic distribution of Ngij(9)(9 l j-(x ( - NS) ) — 9 l ) x 

(9 j f (x^) - 03) is the chi -square distribution with d degrees of freedom 
since p^ (p\ ) oc (2vr) ~ d / 2 \ 9ij (9)\ l l 2 exp{ (l/2) 9lJ (9) (p? - rf ) {ji? - rf ) } ( 1 + 
O p {N-^ 2 )), where /J := y/N(6 i - l ), rf := y/Nft - 8 i ) and 9 is the maxi- 
mum likelihood estimator based on the observation x^ N ' . Thus, the risk (5) 
is d/(2N) +o(N~ 1 ), coinciding with (3). 

When d > 3, the risk (5) with / = ttq (£ = 9) is smaller than that with / = 
vrj on the order of l/N since p na (fj,\x^) oc {g ij {9)fi i /i J ')-( d - 2 )/ 2 (2vr)~ d / 2 x 
|^(0)| 1 /2 e xp{(l/2)^(0)(^-^)(^-^)}(l + O ?3 (iV- 1 /2)). 

When d = 2, we can verify that the risk (5) with / = ttq (£ = 9) is smaller 
than that with with / = ttj on the order of l/(N\ogN) since p-ndp^x^) oc 

{1 - (1/ log AO log( 5i #)/V)} x ^ 
^)}(l + O p (logiV)- 2 ). 
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Therefore, the Bayesian predictive distribution based on ttq asymptoti- 
cally dominates that based on the Jeffreys prior 7rj. □ 

From (4) and the relation (l/2)^'ftlog(//7rj)^log(//7rj) - (vrj//) A(//vrj) = 
— 2(7Tj//) 1 / 2 A(//7Tj) 1 / 2 , we have the following theorem. For the definition 
of superharmonic functions on Riemannian manifolds, see, for example, [16]. 
A C 2 function is superharmonic if and only if A/ < 0. 

Theorem 2. Let f(6) be a smooth prior density on a model manifold 
(M, g) endowed with the Fisher metric. The Bayesian predictive distribution 
based on f(9) asymptotically dominates the Bayesian predictive distribution 
based on the Jeffreys prior ttj(9) if and only if (//vrj) 1 / 2 is a nonconstant 
positive superharmonic function on (Ad, g) . 

It is known that there exists a Green function associated with the Lapla- 
cian A if and only if there exists at least one nonconstant positive superhar- 
monic function [16] . Therefore, the existence of a Green function on (M, g) is 
necessary (and sufficient) for the existence of positive superharmonic func- 
tions on (M,g). 

It is proved by Aomoto [4] that there exists a Green function if a complete 
and simply connected manifold has strictly negative curvature (d = 2) or has 
negative curvature (d> 3). 

The sectional curvature of two-dimensional subspace of the tangent space 
T p of M at p spanned by X and Y is defined by K{X, Y) := (R ijkl X i Y^Y k X 1 ) / 
{{9ik9ji — 9jk9u)X l Y J X k Y 1 } , where Rijki is the curvature tensor defined by 

J^ijkl ■— jk u j L ik 1 in L jk 1 jn L ik)9lm- 

A Riemannian manifold M is said to have negative curvature if K (X, Y) < 
for all linearly independent tangent vectors X,Y €T p at every point p£M 
and have strictly negative curvature if K(X,Y) < —5 (5 is an arbitrary 
positive number) for all linearly independent tangent vectors X, Y G T p at 
every point p S M. 

Thus, we have the following theorem. 

Theorem 3. Let (M,g) be a complete simply connected model manifold 
endowed with the Fisher metric. If(M,g) has strictly negative curvature (d = 
2) or has negative curvature (d>3), then there exist Bayesian predictive 
distributions asymptotically dominating the Bayesian predictive distribution 
based on the Jeffreys prior. 

Note that some global differential geometric properties of the model man- 
ifold are essential in the present theory, although the field of information ge- 
ometry has hitherto required only the theory of the locally characterizable 
properties of manifolds; see [3], page 1. 
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3. Conformal transformations corresponding to prior distributions. We 

investigate constructing methods for shrinkage priors asymptotically dom- 
inating various kinds of vague priors other than the Jeffreys prior. For in- 
stance, right invariant priors are more recommended for group models than 
the Jeffreys priors; see [20, 29]. In this section we assume that the dimension 
d of the model manifold M is greater than 2. 

We introduce conformal transformations corresponding to prior densities 
and show that there exist shrinkage predictive distributions if the model 
manifold endowed with the conformally transformed metric satisfies some 
differential geometric conditions. 

A transformation of the metric tensor gij(9) of the form g%j{9) = v(9)gij(9), 
where v{9) is a positive function on M, is called a conformal transformation. 
Refer to [11] for details of conformal transformations. 

From (3), the difference between the risks of the Bayesian predictive dis- 
tributions based on prior densities / and h is 

N 2 {E[D(p(y\9),p h (y\x^))\9] - E[D(p(y\9),p f (y\x^))\9]} 



h_\ 2 /(^-2) 

vrj/ 



-2KU *°*m-7\ 



+ o(l), 



where A denotes the Laplacian corresponding to the metric gijiO) '■= {h(9)/ 
7rj((9)}2/( d ~2) g^^O). Thus, we obtain the following theorem in the same way 
as in the proof of Theorem 1. 

Theorem 4. Let h{9) be a smooth prior density and let (M,g) 
be the model manifold endowed with the metric defined by gij(9) := {h(9)/ 

7r j(@)} 2 ^ d ~' 2 " > 9ij(®)> where ttj(9) is the density of the Jeffreys prior. The di- 
mension of M is assumed to be greater than 2. 

// there exists a Green function G(^,9) on the Riemannian manifold 
(M,g), there exist Bayesian predictive distributions asymptotically domi- 
nating the Bayesian predictive distribution based on the prior density h{9). 
In particular, the Bayesian predictive distribution based on the prior density 
■K^{9)d9 := G(l;,9)h(9) d9 , where £ E M is an arbitrary fixed point, asymp- 
totically dominates the Bayesian predictive distribution based on the prior 
density h{9). 

In the same way we proved Theorem 2 and 3, we have the following 
theorems. 



Theorem 5. Let f{9) and h{9) be smooth prior densities on a model 
manifold M {d > 3). The Bayesian predictive distribution based on f(9) 
asymptotically dominates the Bayesian predictive distribution based on h{9) 
if and only if (f /h) 1 ^ 2 is a nonconstant positive superharmonic function on 
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the model manifold (M, g) endowed with the conformally transformed metric 
g tJ (9):={h(9)Me)} 2/id ~ 2) mj(0). 

Theorem 6. If a model manifold (M,g) (d > 3) endowed with the con- 
formally transformed metric gij(0) '■= {h(6)/irj(6)} 2 /( d ~ 2 } gij(6) is complete, 
simply connected and has negative curvature, there exist Bayesian predictive 
distributions asymptotically dominating the Bayesian predictive distribution 
based on the prior density h(9). 

4. Examples. In this section we see several examples of Bayesian predic- 
tive distributions based on shrinkage priors constructed by using the meth- 
ods introduced in the previous sections. 

In estimation theory, it is known that asymptotic domination of one esti- 
mator over another does not always means exact finite-sample domination 
because there are examples where the convergence of the risk expansion is 
not uniform over the parameter space (see [14, 15, 23, 24, 25]), and further 
studies that bridge asymptotic and exact theories are required. The same 
difficulty exists in asymptotic prediction theory. 

Nevertheless, in the following examples, many Bayesian predictive distri- 
butions based on shrinkage priors constructed by using the asymptotic the- 
oretical methods exactly dominate Bayesian predictive distributions based 
on vague priors. Therefore, the methods introduced in the previous sections 
are useful tools to construct shrinkage predictive distributions for practical 
use. 

Example 1 (The multivariate normal model with a known covariance 
matrix). We consider the d-dimensional Normal model N^(/x, E), where [i = 
(//i,/X2, • • • , Hd) £ ^ d is an unknown mean vector and E is a known variance- 
covariance matrix. We consider the problem of predicting y ~ N<j(/z, E) using 
x^ N ' , that is, a set of N independent observations from the same density. 

The (i, j)-component of the Fisher information matrix gij does not depend 
on [i. We assume that = 1 for i = j and g^ = for i ^ j without loss of 
generality. Thus, the model manifold (M, g) endowed with the Fisher metric 
is isometric to ci-dimensional Euclidean space. 

The Jeffreys prior 7Tj(//) oc 1, which is invariant under the translation 
group, is commonly used as a vague prior for /i. The Bayesian predictive 
density Pir 3 (y\ x ) based on 7rj(/i) is the best predictive density that is 
invariant under the translation group. 

The minimal positive fundamental solution of the heat equation du(t, 9)/dt = 
Au(t, 6) on M. d endowed with the usual Euclidean metric is given by 7t(£, pi) = 
(A-Kt)- d / 2 exp{-\\n-C\\ 2 /(4,t)}, where fi,£€R d . 

When d<2, the integral (1) becomes infinite and shrinkage priors do not 
exist. This fact corresponds to the relation discussed by Brown [6] between 
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the recurrence properties of Brownian motion on M. d and the existence of 
shrinkage estimators for the multivariate normal model with known covari- 
ance matrix. When d > 3, the integral, which is the Green function on d- 
dimensional Euclidean space, is given by G(£,p) = {T(d/2 — l)/(47r d / 2 )}||^ — 

£||-(d-2)_ 

The Green prior defined by ttg(p) dp = G(0, p)irj(p) dp oc \\ii\\~^ d ~ 2 ^ dp 
coincides with Stein's prior tts(p) [26]. By Theorem 1, the Bayesian predic- 
tive distribution pc(y\x) based on Stein's prior asymptotically dominates 
Pn } (y\x). These asymptotic results also hold for general multivariate loca- 
tion models. 

The explicit form Pir G (y\x^ N ^) for the d-dimensional Normal model was 
obtained and it was shown that P-K G (y\x^) exactly dominates p-w 3 {y\x^ N ^) 
for arbitrary N > 0; see [19]. Recently, George, Liang and Xu [13] showed 
the Bayesian predictive distribution based on a prior density / exactly dom- 
inates p-Kjiylx^) when N is sufficiently large if and only if yff is a positive 
superharmonic function. This result for the multivariate Normal model cor- 
responds to Theorem 2. 

Next, we consider the conformal transformation corresponding to the 
Green prior ttq(p) dp. Here we assume that the parameter space is © = 
M. d — {0} for simplicity. Then the model manifold M is homeomorphic to 
the "cylinder" S^" 1 x K, where S d ~ l denotes the (d — 1) -dimensional unit 
sphere. The conformal transformation corresponding to the prior ttq is given 
by 9ij = (^cip) /^](p)) 2 ^ d ~ 2 ^ 9ij = (£ Pi)' 1 9ij • The Riemannian manifold 
(M,g) can be imbedded in Euclidean space endowed with the usual 

metric by the map (pi,pv, . . . ,pd) >-> ((£i=i p 2 )~ 1/2 pi, • • • , (£i=i Pi)~ 1/2 Pd, 
(1/2) xlog(£f =1 /,?)). 

There does not exist a Green function on the Riemannian manifold (M,g), 
because the integral (1) becomes infinite. Thus, a predictive distribution 
asymptotically dominating Pn G (y\x^ N ^) based on the Green prior cannot be 
constructed by using the method introduced in Section 3. This fact seems 
to be related to the admissibility of the shrinkage predictive distribution. 

Example 2 (Location- scale models). Let p(x) be a probability density 
on M. that is symmetric about the origin. We consider the location-scale 
model p(x\p,a) dx := (l/o~)p((x — p)/o~) dx, where /iGi and a > are un- 
known parameters. Without loss of generality, we can assume that the met- 
ric tensor coefficients are given by = a/a 2 , g aa = a/a 2 and g^ = by 
rescaling p, where a > is a constant depending on p(x). The model man- 
ifold is the hyperbolic plane H 2 (— I /a) with constant curvature K = —1/a. 
The Laplacian on the model manifold is given by A = (a 2 / a)(d 2 / dp 2 + 
d 2 /da 2 ). The Green function on H 2 (—l/a) is given by G((p,a), (0, 1)) = 
— l/(27r) logtanh(/o(/x, a)/(2y/a)), where p{p,a) = dis((0, 1), (p, a)); see, for 
example, [4, 9]. 
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Thus, the Bayesian predictive distribution based on the prior vtg(/U, a) d\i da oc 
a), (0, l))7rj (//, a) dfi da asymptotically dominates the Bayesian predic- 
tive distribution based on the Bayesian predictive distribution based on the 
Jeffreys prior. 

The location-scale model is a group model. The Jeffreys prior 7rj(/i,cr) oc 
1/a 2 is the left invariant prior. However, the best invariant predictive distri- 
bution is the Bayesian predictive distribution based on the right invariant 
prior 7Tr(/x,ct) oc 1/a; see [20, 29]. Here ttr/itj oc a is a positive harmonic 
function on the model manifold and satisfies the condition of Theorem 2. 
Furthermore, there exist Bayesian predictive distributions based on positive 
superharmonic priors asymptotically dominating the best invariant predic- 
tive distribution. The details of this topic will be discussed in another paper. 

Example 3 (The 2x2 Wishart model W2(rn, £)). Suppose that we 
have a set of independent observations X(1),X(2), . . . ,X(N) from the 2x2 
Wishart distribution W2 (??&,£) with m degrees of freedom. The density of 
the 2x2 Wishart distribution W2(jn, E) is given by 

p(X\T,)dX = ; ] - , / J XK m - 3 )/ 2 expf--tr£- 1 X N ) dX, 

^ V 1 ' 2 m r 2 (m/2)|S| m /2 l 1 p \ 2 J 

X > 0,m > 2. 

Then the distribution of the sufficient statistic X := Y^iLiX(l) is the 2x2 
Wishart distribution W2(Nm, E) with Nm degrees of freedom. 

Let (M, g) be a Riemannian manifold composed of 2 x 2 positive definite 
matrices endowed with the Fisher metric. The inner product of tangent 
vectors A and B at a point E G M is given by Ytr(E _1 AE _1 i?), and the 
Jeffreys prior is given by 7rj(E) d£ oc |E|" 3 / 2 cZE = |E| 3 / 2 dS" 1 . 

The posterior distribution with respect to the Jeffreys prior is the inverted 
Wishart distribution W^ l (Nm, X). The Bayesian predictive distribution for 
Y ~ W2(m, E) based on the Jeffreys prior is given by 

n (Y\X) dY - T{{Nm + m )/ 2 ) T (( Nm + m ~ l )l 2 ) 

PwA 1 ' vr 1 /2r(AT m /2)r((iVm-l)/2)r(m/2)r((m-l)/2) 

|j^|7Vm/2|y|(m-3)/2 



\X + Y\( Nm + m )/ 2 



We construct a shrinkage predictive distribution asymptotically dominat- 
mgp Vj (Y\X). 

We parameterize E by 

/ 

(6) E = e A 

\sm- cos- j ^ ' \-sm- cos ^ 



C ° S 2 " Sm 2 \(z p \ COS 2 Sm 2 
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where AgM, p > 0, < # < 2tt. The Fisher metric is g = m(dX 2 + dp 2 + 
sinh 2 pd6 2 ), and the density of the Jeffreys prior is 7rj(A, p,9) dXdpdO = 
vrj (£) | (dX/8(p, 9,X))\d\ dpdO oc sinh p cU dp d0. 

Let 5^ be a submanifold of (M, g) specified by A = Ao, where Ao is a con- 
stant . The induced metric on the submanifold S\ is g = m(dp 2 + sinh 2 pdO 2 ). 
Thus, the Riemannian submanifold (S\ ,g) of the model manifold (M,g) 
is isometric to the hyperbolic plane H 2 (—l/m) with constant curvature 
K = —1/m. Geometric properties of the hyperbolic plane are widely known; 
see, for example, [9]. The Laplacian on (M,g) is 

1 / d 2 d 2 coshp d 1 d 2 \ 

m\dX 2 dp 2 sinh p dp sinh 2 pdO 2 J 

The Riemannian geometric structure of S\ does not depend on the value of 
Ao- We identify (S\,g) with H 2 (—l/m). The set of submanifolds {Sa|A G M} 
is a foliation of the model manifold (M, <?). 

By Theorem 3, a shrinkage prior dominating the Jeffreys prior on the 
model manifold (M, <?) exists since (M,g) has negative curvature and the 
dimension d is 3. Here we introduce a shrinkage prior based on the Green 
function on (S\,g), which is different from the Green prior ttq based on 
the Green function on (M,g). The Green function on (S\,g) is given by 
G x (e x I,Z) = -(l/2vr)log{tanh(p/2)}, where |S|_= exp(2A). 

We define a function on (M,g) by h(£) := G (1/2 )i og | S |(|S| 1/2 I, S). The 
function h(E) is superharmonic on (M,g) and satisfies A/t(S) =0 if p ^ 0. 

We introduce a shrinkage prior distribution defined by 7rg(S) dpdOdX oc 
/i(S)7rj(A, p,a) dXdpdO oc — (l/27r) log{tanh(p/2)} sinhpdAdpiiS. This prior 
"shrinks" the posterior to the submanifold of (M,g) specified by p = 0. 

By a discussion similar to the proof of Theorem 1, we can see that the 
Bayesian predictive distribution p ns (Y\X) based on its asymptotically dom- 
inates p 7TJ (Y\X). 

In fact, the explicit Bayesian predictive distribution p ns (Y\X) with re- 
spect to the prior tt$ exactly dominates p 7TJ (Y\X) based on the Jeffreys 
prior. The details and some other priors will be discussed in a forthcoming 
paper. 
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